大规模并行处理器编程：实践指南：通用GPU架构的演进

从 NVIDIA GT200 到 Fermi架构标志着 第三代GPU计算。此前的架构以图形处理为核心，经过“改造”用于数学计算；而Fermi则是从零开始为 通用GPU（GPGPU） 应用而设计的。

与专注于纹理单元和严格数据并行性的GT200不同，Fermi引入了统一的内存请求路径。这一转变开启了 计算思维，使开发者能够突破简单的二维网格映射，转向复杂的C++算法开发。

Fermi引入了真正的 L1/L2缓存层级结构 并符合 IEEE 754-2008 浮点标准。这意味着研究人员不再需要为每个字节手动管理“临时存储”内存（共享内存），从而能够支持不规则的数据结构，并实现适合科学工程领域的双精度计算精度。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which architecture is considered the true start of the 'Third Generation' of GPU computing?

GT200 (Tesla)

Fermi

G80

Fixed-function Pipeline

QUESTION 2

What memory feature was introduced in Fermi to help handle irregular data patterns?

Manual Scratchpad only

Hardware-managed L1/L2 Cache Hierarchy

Write-only Texture Buffers

Disabling Global Memory

QUESTION 3

Fermi's compliance with IEEE 754-2008 was critical for which application type?

Simple 2D Sprite Rendering

High-precision Scientific Computing (FP64)

Text Scrolling

Basic Vertex Shading

QUESTION 4

What does 'Computational Thinking' refer to in the context of the Fermi shift?

Treating the GPU as a fixed-function signal processor.

Focusing on the physics of the problem rather than manual data movement.

Manually coding assembly for every pixel.

Using only 2D textures for storage.

QUESTION 5

How did Fermi improve thread management?

It removed the concept of Warps.

It introduced sophisticated hardware thread scheduling.

It limited threads to only 32 per GPU.

It forced all threads to run the same instruction forever.